Gastrointestinal tract disorders classification using ensemble of InceptionNet and proposed GITNet based deep feature with ant colony optimization

Computer-aided classification of diseases of the gastrointestinal tract (GIT) has become a crucial area of research. Medical science and artificial intelligence have helped medical experts find GIT diseases through endoscopic procedures. Wired endoscopy is a controlled procedure that helps the medical expert in disease diagnosis. Manual screening of the endoscopic frames is a challenging and time taking task for medical experts that also increases the missed rate of the GIT disease. An early diagnosis of GIT disease can save human beings from fatal diseases. An automatic deep feature learning-based system is proposed for GIT disease classification. The adaptive gamma correction and weighting distribution (AGCWD) preprocessing procedure is the first stage of the proposed work that is used for enhancing the intensity of the frames. The deep features are extracted from the frames by deep learning models including InceptionNetV3 and GITNet. Ant Colony Optimization (ACO) procedure is employed for feature optimization. Optimized features are fused serially. The classification operation is performed by variants of support vector machine (SVM) classifiers, including the Cubic SVM (CSVM), Coarse Gaussian SVM (CGSVM), Quadratic SVM (QSVM), and Linear SVM (LSVM) classifiers. The intended model is assessed on two challenging datasets including KVASIR and NERTHUS that consist of eight and four classes respectively. The intended model outperforms as compared with existing methods by achieving an accuracy of 99.32% over the KVASIR dataset and 99.89% accuracy using the NERTHUS dataset.


Introduction
The GIT is a vital part of the digestive system that is vulnerable to many infections.Infections can lead to serious illness if not detected and treated at early stages [1].Colorectal cancer is a type that can be prevented in most cases with effective measures such as colorectal lymph nodes that can be detected, diagnosed, and treated in early stages [2].Different kinds of cancer usually result in 70% of deaths in middle-income countries due to not being detected and treated in the early stages [3][4][5].The countless distinct irregular mucosal outcomes may differ from moderate annoyances to deadly diseases in the GIT [6].According to statistics compiled by the World Health Organization (WHO), a huge number (1.93 million) cases of colon and rectum cancer were suffering around the globe.The WHO reported a large number of death cases (916 000) and 1.09 million cases of stomach cancer in 2020 [7].The insufficient understanding of the standardized classification of such cancers by endoscopic imagery contributes to the rise of the enormous number of fatalities worldwide.Endoscopic methods like wireless capsule endoscopy (WCE) and wired endoscopy (WE) are used to find illnesses in the GIT [8].The WCE procedure is time-consuming and out of control for the operator, whereas the WE is a controlled procedure.An operator can focus on a specific region of the GIT by adopting the WE procedure [9].Colonoscopy is used to scan the lower portion of the GIT, and WE procedures are used to examine both (the lower and upper) portions of the GIT [10].Through the endoscopic procedure, videos are examined and captured by the medical expert for disease analysis and detection [11].
Multiple challenges are found in the GIT images, including huge amounts of clutter, occlusions, complex backgrounds, variable lighting conditions, huge inter-class similarities among objects of different classes, and huge intra-class variations among the objects in a single class.An experienced medical specialist is required to handle these challenges.Also, the challenging task for the medical expert is to examine the large number of video frames which increases the workload of the medical experts.An automatic system is required that can classify the frames of GIT diseases accurately and take less time as compared with the gastroenterologist.It is a challenging job to develop an automated procedure using machine learning algorithms.A precise Computer Aided Diagnostic System (CADx) can aid specialists in disease classification decision-making [12][13][14].The accuracy of the disease classification is the most important factor [15].In recent years, various artificial intelligence (AI) systems have come into being for diagnosing different kinds of diseases in GIT.CADx has become a very important part of research in medical and computer science for disease classification by using an imagery dataset.Deep learning methods are growing rapidly in the healthcare system for disease classification [16][17][18].Automatic disease detection and classification approach is introduced [19].The new emerging and popular deep learning models (InceptionNetV3, VGG, ResNet, and a goo-gleNet) are being used in the research work that are already trained on the millions of imagery datasets for disease detection and classification.Convolutional neural networks (CNNs) are emerging tools in machine learning for deep feature learning and classification [20].The main goal of this research is to introduce the computer-aided GIT disease classification system which assists the healthcare system in reducing death rates by diagnosing GI tract diseases at an early stage.
In this manuscript, the deep learning-based GIT disease classification system is proposed by using endoscopic frames.Based on our extensive experimentation, we finalized the proposed model in which the proposed preprocessing technique (AGCWD) is performed, and the resultant image is fed to the CNN-based networks.To obtain better results, out of two CNN networks, one (InceptionNetV3) is the pre-trained network, while the other (GITNet) is designed by ourselves to acquire features from both networks.After that features are optimized by the ACO technique and fused serially.In the last stage, classification is performed by variants of SVM classifiers.The results show the effectiveness of the proposed solution.The AGCWD makes the frames more appealing.The typical procedure for feature extraction is performed by deep learning models, including InceptionNetV3 and GITNet.The deep learning models encode features automatically.Further features are optimized and fused serially.After that, the fused features are supplied to the classifiers.The classifiers including CGSVM, LSVM, CSVM, and QSVM are employed, whereas the QSVM classifier provides better results in terms of accuracy.The major contributions of the article are stated as follows: • The technique AGCWD is employed as a preprocessing phase for enhancing the pixel intensity level, which helps to get better features for disease classification.
• The proposed deep learning model GITNet is trained and tested by using a third-party dataset i.e.CIFAR-10 [21].The features are acquired from the fully connected layers of GITNet and an existing pre-trained model named as InceptionNet.
• A bio-inspired approach name ACO is applied for feature subset selection and features are fused serially and passed to the SVM-based classifiers for classification.
The organization of manuscript after the introduction section is organized as: The associated work is specified in Section 2. The material and methods are explained in Section 3. The results of the experiments are elaborated on in Section 4. Lastly, Section 6 concluded the manuscript and describes future work.

Related works
The most challenging problem for gastroenterologists is differentiating between healthy and harmful images, such as ulcers, bleeding, and polyps [22,23].The endoscopic frames are classified using different methods that have been proposed.The most recent methods for endoscopic frame classification are addressed in this section.The image acquisition and preprocessing approaches are briefly described.The images of the GIT disease are acquired by the procedure of endoscopy.Most of the endoscopic frames contain low-intensity pixels and noise [24].The noise is eliminated by the thresholding segments of the frames and image clipping methods.The image clipping divides the background into tiny noise spots like edge detection.Simple and median filters reduce the noise from the frames by designing an effective noise filter [25].Many color processing methods use various color spaces including RGB [26], CIEXYZ [27], CIELAB [28], YIQ [29], HSV [30], YUV [31], and HIS [32].Statistics of color and moments are used as color space.The preprocessed frames are processed further to extract features.
A variety of feature extraction approaches are employed in the existing work, including point features that represent the features of objects geographically [33][34][35], texture features that define the object surface as fine, coarse, smooth, and grained [36], HOG features focuses on shapes of the objects [37], and color features [38].A combination of deep features enhances the performance of the system [39].In one research work, deep features of two CNN models are also integrated [40].CNN models extract the features from the deep layers [41], the model including AlexNet [42], VGG-16 [43], ResNet [44], and InceptionNetV3 [45].The distributed deep learning method is employed for disease detection [46].Diseases of GI tract are detected and classified by using a deep learning model for feature extraction [47].Deep features extract the information and synthesize the information using skip connections from the previous layers [48].The most appropriate features offer high accuracy in categorization outcomes.Mostly, the redundant features are present in the extracted feature set which affects the classification results.To remove the redundancy of the features and reduce the computation cost of the classifiers, the ACO method is adopted to optimize the features.ACO optimizes the features for ulcerated lesion classification work [49].The metaheuristic technique is employed for feature optimization [50].There are three methods for feature selection and dimensionality reductions including the filtering method mapping relationships between input and target variables using a statistical approach, the wrapper method is the specific machine learning technique for feature selection, and the embedded method integrates the wrapper and filter method [51].The feature fusion of automated systems in computer vision and image processing for illness classification tasks is crucial and supports the achievement of more accurate and effective results [40].In the hybrid approach, several pieces of literature address handcrafted features and deep features being fused to provide better results.In this approach, deep features are fused for stomach disease classification [52].In the existing work, the class of techniques that utilize the powers of hybrid methodologies is increasing.The different classifiers are trained by using fused features of handcrafted and deep features [53].The KNN with SVM classifier is employed for GIT disease classification [54].For disease classification, the k-Nearest Neighbor (KNN) with SVM classifier is employed [55].The CNN with multi-scale feature fusion approach is applied for colon cancer identification [56].VGG16 and VGG19 are employed with handcrafted methods in hybrid mode for GIT disease classification [16].
The existing literature reveals that researchers employ a hybrid approach for the classification of GIT diseases and obtained less accuracy.The technique that deals with the fusion of deep features is used very rarely.So, there is still a need to design a framework that can more accurately differentiate endoscopic frames of GIT diseases.

Proposed methodology
The proposed approach comprises five stages are presented in this section.In the first stage, the technique AGCWD is used for image enhancement.A new CNN-based GITNet model is designed in the second stage and deep features are obtained by the CNN-based pre-trained InceptionNetV3 and GITNet models.Features are optimized in the third stage by the ACO method.The serial feature fusion procedure is adopted in the fourth stage.In the last fifth stage, classifiers are used to classify the diseases of GIT using fused feature sets.An impression of the proposed GIT disease classification approach is depicted in Fig 1.

Preprocessing
Preprocessing is essential for improving the visualization of the frames of endoscopy.So, the enhancement of the contrast of images is a significant and vital stage for disease classification.This phase's major benefit is that it offers more powerful and relevant features for accurate classification.In this work, an Adaptive-Gamma-Correction (AGC) is employed with the weighting distribution (WD) function referred to as AGCWD, which enhances the intensity of the frames [57].In the AGCWD method, the transformation of color space is performed from RGB to HSV color space.The color contents in the HSV are specified by the hue (H) and the saturation (S) that are preserved, whereas the luminance intensity (V) component is tweaked for color contrast enhancement as shown in Fig 2. The function of WD is to marginally alter the statistical histogram and reduce the development of negative consequences.The function of WD is formulated as: where pdf WD (I) is the probability-density-function (pdf) of the weighting distribution of the input image; A term α is the adjusted parameter.The pdf max and pdf min are the maximum and minimum pdf of the statistical histogram respectively.The cumulative-distribution-function (cdf) is dependent on pdf and is expressed as: where the quantity pdf WD is stated as follows: The plot of the pdf and pdf WD is illustrated in Fig 3 .AGC function uses cdf and pdf for the intensity transformation.The formulation of the AGC is given as: where I max presents the maximum intensity of the input and P(I) is the transformed intensity of each pixel of the image.The main advantage of the AGC is to gradually increase the intensity of low-intensity pixels while avoiding a major decrease in the intensity of high-intensity pixels.The gamma parameter is formulated as follows: The plot of the Luminance intensity, the output of the AGC function P(I), and the enhanced image after the gamma operation is depicted in Fig 4. The AGCWD preprocessing method is utilized over the KVASIR dataset.Some enhanced frames of the KVASIR dataset are illustrated in Fig 5.

Deep feature extraction
Deep features are very important for computer vision tasks that are acquired from the endoscopic frames.Deep learning is an emerging technology that is participated with computer vision and image processing for disease classification [58].A CNN consists of different layers (Input, Convolutional, Batch Normalization, Fully Connected, and ReLU layers).The convolutional layer gets the data from the CNN input layers and the weights of the network are calculated.The inactive neurons are detached, and the activation functions are used via the ReLU layer.In this work, two CNN models, InceptionNetV3 and GITNet, are employed for feature extraction.The following sections describe the deep learning architectures.

InceptionNetV3 model.
The InceptionNetV3 is a deep learning architecture that performs well for categorization.The architecture consists of a directed acyclic graph (DAG) having 316 layers, 350 connections, and 94 convolutional layers [59].The different significant features are obtained by applying several masks on different layers of the model.As compared with conventional CNN models, InceptionNetV3 is a diverse model that allows the In this manuscript, the approach of deep learning is employed.A new deep learning model takes a much time for training a model while a pre-trained deep learning model saves time for feature extraction.The input images are fed to the convolutional layer to get the feature maps in the training phase.Different filter sizes are employed in the model, including 1x1, 3x3, and 5x5 filter sizes.The large kernel size is considered suitable for collecting information that is distributed globally in the frames, while the small kernel size collects information locally.The activation functions are performed for scaling data and features are derived from the different blocks of the model.The InceptionNetV3 characterizes the convolutional block (CB) and large convolutional blocks (LCB) respectively, where feature maps from distinct paths are concatenated as the following module's input.The features with a dimension of (4000x2048) are taken from the fc7 layers.These features are optimized and fused before the training of the classifiers.By the end of the inception module, this network additionally employs global average pooling and fully connected structures.The softmax layer is used at the end of the model with a 1000-way global average pooling layer and fully connected layers.The correlation statistics are assessed step by step as the best network topology is built, resulting in highly correlated outputs when formulated as: where X InceptionNet describes the feature set and the dimension of the feature set is kept MXN.The dimensionality of the extracted feature from the fc12 layer is achieved as 4000x4096.The complete feature set obtained from the GITNet is formulated as follows: where Y GITNet describes the feature set with MXN dimension.The variants of filter sizes, including filter depth, max-pooing, and stride size over the network are used for dealing with a large variety of objects having features in the frames.The GITNet explores semantic information and mutual information by employing varying depths of filter and filter size.The layered summary of the complete GITNet model is depicted in Table 1.The different filter sizes of 5x5, 1x1, and 3x3 are used with changing padding and stride size such as

Feature selection and fusion
The extracted features from the GITNet and InceptionNetV3 are optimized by the ACO method which diminished the redundant features in the X InceptionNet Y GITNet feature sets.The extracted features contain a lot of redundant information which causes a reduction in the model performance.Features are optimized by the ACO algorithm which eliminates the unnecessary information in the sample set by using the probabilistic technique.The advantage  of using an optimizer is that it reduces the computation cost of the classifier.The mathematical formulation of the two-deep learning models is as follows: where the notation B InceptionNet and B GITNet are the optimized features of InceptionNetV3 and GITNet that are obtained from the ACO method.After that, the optimized features are fused serially and mathematically as follows: where the symbol F describes the combined form of the features of two-deep learning models with the dimensions MXN.The detailed form of the feature fusion method is depicted in Fig 10 .The same input image with a different size is given to both deep learning models.The GIT-Net and the InceptionNetV3 explore the deep features by using the CNN method.The extracted features are optimized by the ACO method and fused serially.The classifiers are trained on these features.is acquired by the QSVM in the form of 99.32% accuracy using KVASIR dataset.The parameters of all classifiers are arranged such that the kernel is adjusted automatically, where box constraint level 1 is adjusted with the true value of the standardized data, and the multiclass method of each classifier is set to one-vs-one.

Results and discussion
The best three experiments are illustrated in this manuscript from several experiments.A detailed description of the KVASIR and NERTHUS datasets is given in section 4.1.The different methods of SVM are used for classification, where QSVM and CSVM provide better results as compared with other SVM classifiers.The classifiers are trained and tested by using 5-fold cross-validation.The experiments are carried out using the model NVIDIA GTX 1070 GPU.The Core i5 machine has 8GB of inbuilt RAM with the Windows 10 platform.MATLAB 2021a is exercised for overall experimentation.This section describes the detailed description of the datasets and performance evaluation method with numerical results and visualizations.

Datasets
In this study, KVASIR and NERTHUS datasets are employed for assessment.The KVASIR dataset consists of 4000 images comprising 500 images of each class [61].The labeling of these images was carried out by an expert endoscopist.Each labeled class consists of 500 images, which makes for a balanced dataset.The deep models required a large dataset for better feature extraction.The data augmentation approach is carried out in the proposed work.The classes that are part of the KVASIR dataset include "Z-line", which is the area where the esophagus transits to the stomach; "Pylorus," which refers to the site of the stomach-duodenum entry; and the proximal segment of the large intestine is known as "Cecum".These three classes belong to the anatomical landmarks section.Accurate classification of these classes provides substantial help in materializing efficient navigation inside the GIT.Three more classes labeled "Esophagitis" (an abnormal condition of the esophagus), "Polyps" (lesions affecting the bowel), and "Ulcerative Colitis" (inflammatory conditions of the large bowel) consist of images related to pathological findings.Besides these six classes, two more classes belong to Endoscopic Mucosal Resection (EMR) related conditions.These are labeled as "dyed and lifted polyps" (lesions before removal and injected with blue-colored saline) and the "dyed-resection-margins."The resolution of the frames varies from 720 x 576 pixels to 1920 x 1072 pixels in the KVASIR dataset.NERTHUS dataset is produced by colonoscopy (endoscopic examination of the bowel).NERTHUS comprises four classes with 5525 frames of bowel from 21 videos [62].
The detailed information on the KVASIR and NERTHUS datasets is specified in Table 2.The sample of the KVASIR dataset comprises eight classes, as illustrated in Fig 11.

Performance evaluation protocols
This study deals with a multiclass classification problem where the models are assessed by various standard evaluation metrics.The accuracy quantifies the correct prediction over all samples.The precision determines the correct positive identified findings.Sensitivity or recall returns the information of correctly identified real positive numbers.Similarly, the F1 is correlated between Precision and recall.Geometric means indicate the central tendency.The mathematical expression of each evaluation protocol is given below.
The rate of the correctly predicted positive class is depicted as true positive (T positive ).The rate of the correctly identified as the negative class is attributed as true negative (T negative ).The wrong predictions for the positive class are termed "false positives" (F positive ).The inaccurate estimation probability of the negative class is a false negative (F negative ).

Experimental setup
The convolutional networks InceptionNetV3 and GITNet are used for feature learning.The deep models are trained by using two datasets including KVASIR and NERTHUS containing 4000 and 5525 images respectively.Extracted features from deep models are optimized and fused serially.In this manuscript, several experiments are performed, but only the best three experiments' results are illustrated with 200, 500, and 1000 feature sets.Four classifiers are considered: LSVM, CGSVM, CSVM, and QSVM provide better results in terms of Acc.The results of models achieving high Acc are depicted in Table 3, such as 91.10%, 98%, and 99.32% Acc.The prediction speed in terms of observation per second (obs/sec) is provided of the best classifiers in the three experiments.

Experimental detail using KVASIR and NERTHUS datasets
The  high accuracy in experiment 1 as compared with other classifiers, as shown in Fig 12 .The results of the CSVM are illustrated as the Acc is measured as 91.01%,Sens is calculated as 80.21%, Spec determines the value as 92.66%,Prec is estimated as 60.94%, the F1.S is evaluated as 69.26%, and the value of the G.M is assessed 86.21% that shows the superiority of the proposed work.The results of the CGSVM are illustrated as the Acc is found 90.28%, the value of the Sens is determined as 78.01%,Spec is measured as 92.03%,Prec is calculated as 58.31%,F1. S is measured as 66.72%, and G.M is founded as 84.72%.In the same way, the results of the QSVM are illustrated as the Acc is determined as 90.61%, the value of Sens is as79.61%,Spec is specified as 92.17%, the score of the Prec is as 59.23%, F1.S is as 67.92%, and G.M is evaluated as 85.66%.The results of the LSVM are illustrated as the Acc is measured as 90.55%,Sens value is as 77.41%,Spec determines as 92.43%,Prec number is as 59.36%, F1.S is delivered as 67.19%, and G.M informs about the value is 84.58%.

Experiment 2:
Results with 500 features on KVASIR dataset.In test experiment 2, classifiers are trained and tested by using a 500-optimized feature set.The QSVM classifier provides high accuracy in experiment 2 as compared with other classifiers as shown in Fig 13 .The results of the CSVM are illustrated as the Acc is 97.32%,Sens shows the value as 96.41%,Spec is measured as 97.46%, Prec is calculated as 84.41%,F1.S is achieved as 90.01, and G.M is attained as 96.93% that shows the progress of the proposed model over the other classifiers in experiment 2. The results of the CGSVM are illustrated as the Acc is determined as 93.03%,Sens is founded as 85.81%, Spec is measured as 94.06%, Prec is obtained as 67.35%,F1.S is measured as 75.46%, and G.M is acquired as 89.83%.The results of the QSVM are illustrated as the Acc is attained as 98.01%), Sens is taught as 98.21%, Spec is determined as 97.97%, Prec is achieved as 87.37%, F1. S is calculated as 92.47%, and G.M is obtained as 98.09%.The results of the LSVM are illustrated as the Acc is measured as 97.15%, Sens number is shown as 97.21%, Spec is calculated as 97.14%), Prec is taught as 82.94%,F1.S is quantified as 89.51%, and G.M achieved as 97.17%.

Experiment 3: Results with 1000 features on KVASIR dataset
In test experiment 3, classifiers are trained and tested by using a 1000-optimized feature set.The QSVM classifier provides high accuracy in experiment 3 as compared with other The results of the CSVM are illustrated as the Acc is founded as 98.91%, Sens is measured as 99.01%), the figures of the Spec are as 98.89%, Prec is determined as 92.71%,F1.S calculated as 95.74%, and G.M is assessed as 98.94% that shows the better results of the classifier over the other classifiers.The results of the CGSVM are illustrated as the Acc is determined as 92.37%,Sens value is as 80.21%, Spec improved as 94.11%, Prec shows the value as 66.06%, F1.S is measured as 72.45%, and G.M reflects the value 86.88%.The results of the QSVM are illustrated as the Acc shows the number as 99.32%, Sens is calculated as 99.21%, the assessment of the Spec is determined as 99.23%, Prec is figured-out as 94.88%, F1.S is as 97.37%, and G.M values comes as 99.61%.The results of the LSVM are illustrated as the value of the Acc is acquired as 99.11%, Sens is achieved as (99.81%),Spec is determined as (99.23%),Prec is attained as 94.87%, F1.S is as 97.27%, and G.M is obtained as 99.51%.

Discussion
In this manuscript, GIT disease classification is performed.The preprocessing technique, AGCWD performs the transformation of the pixel energy level of the endoscopic frames.Features are extracted by the deep CNN models, including GITNet and InceptionNetV3.After that, the features of both deep models are optimized by the ACO method, which lowers the computation cost of the classifiers.Additionally, features obtained by the ACO are fused serially.The four classifiers are used for the GIT disease classification including CSVM, CGSVM, QSVM, and LSVM.Correspondingly, in the proposed work, multiple experiments are carried out but only the best three are illustrated after visual analysis using KVASIR and NERTHUS datasets.The experiments are carried out by tuning feature values, but the best results are obtained using total features such as 200, 500, and 1000 features.In experiment 1 using the KVASIR dataset, with 200 features, the CSVM provides higher accuracy than other classifiers.Similarly, the QSVM classifier provides improved accuracy in experiments 2 and 3, which are 98.00% and 99.32%, respectively.The training time of conducted experiment over the KVASIR dataset using four classifiers is depicted in Fig 16 .Grad-CAM is implemented, which represents the learning pattern of the deep learning model.The proposed model is also evaluated using NERTHUS dataset where it performs well.The results of conducted three experiments over the NERTHUS dataset are depicted in Table 5.The results' comparison over the existing SOTA approaches is done which shows that the performance of the suggested work is better than the existing methodologies.

Conclusion and future work
This research work presents an approach for the disease prediction of GIT using endoscopic images.The work is evaluated over two publically datasets including the KVASIR and NERTHUS datasets.The technique AGCWD is employed as a preprocessing phase for enhancing the pixel intensity level, which helps to get better features for disease prediction.The proposed GITNet is a CNN-based deep learning model which is trained and tested by using a third-party dataset i.e.CIFAR [21].The GITNet and an existing pre-trained model named InceptionNetV3 are employed for feature extraction from the fully connected layers.A bio-inspired approach name ACO is applied for feature subset selection.The selected features are fused serially and given to the SVM-based classifiers for predictions.The selected classifiers are trained on the different feature sets of 200, 500, and 1000 and higher accuracy is achieved on 1000 features.The four classifiers considered in this research include LSVM, CSVM, QSVM, and CGSVM which are assessed during experimentation.This SOTA approach provides a classification accuracy of 99.32% on the KVASIR dataset by using QSVM.The proposed approach performed well in terms of Acc (99.89%) when evaluated over the NERTHUS dataset.Moreover, the comparison of the result is done with existing SOTA approaches.From the specified results, it can be concluded that the CNN-based deep learning models including the proposed GITNet and pre-trained InceptionNetV3 with the feature fusion method give outstanding performance for the classification of GIT images.
In future work, a different combination of hand-crafted and deep feature extraction methods can be evaluated.Different metaheuristic feature selection methods may yield better results.The improved techniques for image preprocessing and enhancement may be integrated into the existing model to improve its performance.

2 ]
and padding [0], [same],[2] and[1].Similarly, the number of filters (filter depth) is changed by convolution layers such as 96, 48,256, 384, and 256.Fully connected layers (fc12, fc13) comprise 1x1x4096 and the fc14 dimension is 1x1x100.Pooling window size or other values are set to scale 0.01 and Max pooling 3x3 is set.The complete GITNet model is composed of 50 layers network.The visualization of features in different layers of the GITNet model is shown below.The features are obtained from group convolution and convolution layers for visualization.The layers are stated as GC2 (C6), C7, C9, and GC3 (C10) of the GITNet model.The visualization of features of the InceptionNetV3 is illustrated below, as the list of layers (conv2d_90, conv2d_75, conv2d_93, and conv2d_94 layers).Fig 8 shows the picturing of the features of both deep learning models, including GITNet and InceptionNetV3.The gradient-weighted class activation mapping (Grad-CAM) is employed for the evaluation of the deep learning model.The Grad-CAM shows how well the GITNet model has been

Fig 7 .
Fig 7. The proposed GITNet model.https://doi.org/10.1371/journal.pone.0292601.g007 Continued ) learned by the feature pattern.It is visually confirmed by observing the network and ensuring that it has the right patterns in the image and activating around them.The gradients of the prediction score concerning the final convolutional feature map that is used in the Grad-CAM interpretability technique.Parts of an image with a high Grad-CAM map value have the most impact on the network score for that class.Fig 9 shows the high Grad-CAM map value in the images.

4 . 4 . 1 Experiment 1 :
four classifiers that are considered in this research and the results of individual classifiers are illustrated in sub-sections.The CSVM, CGSVM, QSVM, and LSVM are trained by setting a 5-fold testing arrangement.The performance of the model is shown in the following sections, Accuracy (Acc), Sensitivity (Sens), Specificity (Spec), Precision (Prec), F1 score (F1.S), and Geometric means (G.M) are performance measures.Results with 200 features on KVASIR dataset.In test experiment 1, classifiers are trained and tested by using 200 optimized features.The CSVM classifier provides

Table 5 . Evaluation of the model with 200, 500, and 1000 features using the NERTHUS dataset.
GF) random forest approach achieved 95.9% Acc using the KVASIR dataset.The features were integrated into InceptionNetV3 by applying the data augmentation technique and the result was 91.5% Acc.The technique of feature fusion was accomplished in 2021 with achieving 95.02% Acc.The five deep learning models were used for feature extraction and the technique of feature fusion was applied to achieve 98.3% Acc.The features of two deep learning models were combined with achieving 98% Acc.The resNet-50 was used for feature extraction that attains 95.1% Acc.The analysis of prior work indicates that existing work employed already crafted, pretrained transfer learning with feature fusion methods that provide less Acc as compared with our proposed work.In our suggested work, pre-trained model inceptions are used with the proposed GITNet model for feature extraction that achieved 99.32% Acc which is better than existing methods.